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Abstract 

The purpose of this paper is to present two new alternative methods to 
the current goodness of fit methodology. With the increase use of 
computerized adaptive test, (CAT) the ability to determine the accuracy of 
calibrated item parameter estimates is paramount. The first method applies a 
normalizing transformation to the logistic residuals to make them more 
interpretable . The second method translates residuals directly into a loss of 
information statistic. Both methods require a CAT simulation to accurately 
assess the ability range over which an item would most likely be chosen. 
Results suggest that the lack of fit in the logistic regression should not be 
a major concern in developing a CAT item pool. Suggestions for further 
research are made. 
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An Examination of the Relationship between Normalized Residual 

and Item Information 



Computerized adaptive testing (CAT) has proven to be a powerful 
alternative to traditional pencil and paper test administration (Green, Bock, 
Humphreys, Linn, & Reckase, 1984). In the CAT process the computer selects 
and administers only items which yield the most information aboul: an 
examinee's current estimate of ability. Thus the length of a test and the 
administration time can be shortened considerably without loss of 
information. Specifically, after responding to an item, an examinee's ability 
is estimated and an item which yields the most information at that ability is 
subsequently selected and administered. Items which form CAT item pools are 
usually selected from previous pencil and paper exams that have been 
administered and calibrated. A major concern which arises from this process, 
since only a fraction of the items will be used to estimate any one person's 
ability, is the accuracy of the obtained parameter estimates. Jf item 
parameter estimates do not accurately reflect the true parameters, the item 
information and ultimately the CAT process will be inaccurate. Several 
methods have been proposed to assess the goodness of fit of IRT parameter 
estimates to item response data (Bock, 1972; Yen, 1981; Wright & Mead, 1977; 
and Bishop, Fienberg, & Holland, 1975). However, no one method appears to be 
significantly better than the others (McKinley & Mills, 1985). 

It is the purpose of this paper to briefly discuss one of the 
shortcomings of the current goodness of fit methodology anH present the 
findings of two new alternative methods which can be used as part of the 
selection criteria for CAT item pools, The first alternative is the use of - 
normalizing transformation proposed by Cox and Snell (1968) on the logistic 
residuals so that their size and direction can be more readily interpreted. 
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The second alternative translates logistic residuals directly into a "mis- 
information 11 value . Both of these alternatives rely upcn the simulation of 
the CAT process to accurately assess the ability range over which an item 
would be most likely chosen. 

Theoretical Background 

Several of the x 2 goodness of fit statistics used in the research cited 
above have questionable validity. Hambleton and Swaminathan (1985) discuss 
some of the problems associated with x 2 goodness of fit tests including 
determining the appropriate degrees of freedom, the asymptotically distributed 
nature of the test criteria and the effect sample size has on the power of the 
statistic. 

Another problem is that the overall x 2 value may not be indicative of an 
item's usefulness to the CAT pool. For example, consider Tables 1 and 2 which 
represent a goodness of fit analysis of two items selected from the ACT 
Assessment Program's math subtests. 



Insert Tables 1 and 2 about here 



Table 1 shows Bock's (1972) goodness of fit analysis for item 14, whose 

AAA 

parameter estimates are: a = 0.790, b = 0.604, c = .200. To compute the goodness- 
of-fit statistic the subjects were arranged in increasing order by ability estimate 
and then divided equally into ten cells. The overall x 2 goodness of fit value for 
this item is 20.419 with p = .005. Using just this information, one might reject 
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the item for inclusion into a CAT pool because the estimated ICC appears not to fit 
the response data very well. However, an important criterion which needs to be 
considered is the ability range for which this item would be most likely 
selected. If it was determined (e.g., through a CAT simulation; that item 14 
would be most likely chosen in the 8 range from -1.27 to .27, then the item 
should be considered for the pool. That is, the estimated ICC is quite 
accurately describing the response data as can be seen by the low x 2 values 
for those deciles whose Min. 8 and Max. 8 are in this range. 

Table 2 illustrates the opposite situation. The overall x 2 value for 
item 5 is 9.678 with p = 0.208. The low x 2 value would suggest a good fit of 
the model and parameter estimates to the item responses. However, if it was 
determined that the item would probably be selected in the range from -2.99 to 
-.83 then the inclusion of this item into the pool should be questioned, since 
it is in this range that the estimated ICC provides the greatest lack of fit. 

Thus the point to be made is that the overall goodness of fit statistic 
can be easily misinterpreted. A better understanding of the accuracy of the 
parameter estimates can be achieved by examining the fit of the model in the 
ability range in which an item is most likely to be chosen. 

Experiment 1 

Method 

Subjects . In the first experiment the normalizing Lransf ormat ion by Cox 
and Snell (1968) was evaluated using simulated data. One thousand subjects 
were randomly generated from a N(0,1) distribution. The mean ability of the 
generated subjects was .01 with a standard deviation of .97. 
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Material s 

An adaptive math test was simulated for each subject. The item pool was 
composed of 100 items selected from the Mathematics Usage subtests of Forms 
26A, 26B, and 26C of the ACT Assessment Program. The items were calibrated 
using a three parameter logistic (3PL) IRT model by the computer program 
LOGIST 5 (Wood, Winger sky, Lord, 1982). The sizes of the calibration samples 
, were 2733, 2767, and 2825, respectively. The parameter estimates for the 
items from 26B and 26C were rescaled to the scale defined by 26A. 

In the CAT process an item was selected if it provided the maximun 
information at the current estimated theta level. The first item 
"administered to an examinee 11 in each of the CAT tests was selected based upon 
the generated ability for that examinee. The testing was terminated if the 
selected item had an information value < .3, or if the maximum number of items 
(20) was administered. 

Procedure 

Although helpful , the x 2 goodness of fit measures represent rather gross 
assessments of how well the data is actually fit by the estimated item 
parameters^ It is usually computed based on deciles of the :heta scale whose 
expected value is determined using the mean or median theta value. "Underfit 11 or 
"overfit" of the estimated ICC cannot be determined from the x 2 value. Such 
weaknesses can be overcome using a transformation developed b> Cox and Snell 
(1968). According to Cox and Snell, transformed differences between observed 
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and expected values can be normalized for each estimated theta level. 
Logistic regression residuals can be transformed to normality according to the 
following formula 



RES. = 
1 



n.U(y. /n.) - 4>{P. - l/6( 1-2P. /n. )} 
1 1 l l 1 l i i 1 



p. l/6 (i-p.) l/6 

i i 



RES; = normalized residual at 9. 
1 i 

where * ( ) is the incomplete beta function, I u (2/3,2/3) 

yj is th? number of correct responses to the item for 8. 

n ; is the number of examinees with 9. 
1 i 

p£ is the probability of a correct response at 8^ using the 
3PL IRT model and the a, b, and c parameter estimates 



Cox and Snell (196P) suggest that the obtained set of residuals is essentially 
normally distributed for n^ as small as 5 and = .OA. 

This method has several advantages over the x 2 goodness of fit 
measures. First, a normalized residual may be obtained for each estimated 
ability le/el (provided n^ > 5). This eliminates the concern over the optimum 
number of categories to use in grouping ability levels, or condensing observed 
and expected value information in a single value per decile. 

Secondly, residuals are signed numbers, thus a positive residual would 
irr^ly the model underestimates the observed proportion correct, while a 
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negative residual would imply a model overestimates. Thirdly, because the 
residuals are norialized, their importance can be readily interpreted as z 
scores* 

For each generated examinee the item and current ability estimate were 
iioted for each simulated test* Following the simulation the ability range 
over which each item was selected was then calculated. Residuals, using the 
respective calibration response data and item parameter estimates, were then 
normalized for each theta having n > 5 in the selected ability range. 
Results 

Results of the simulation are reported in Table 3. The average length of 
a simulated test was 17 items. However, of the 100 items in the item pool , 
only 48 items were selected. The parameter estimates and the x 2 goodness of 
fit values (using the original calibration samples) for these items are 
reported in Table 4. 



Insert Tables 3 and 4 about here 



The position and number of times each item was selected ^'s reported in 
Table 5. By examining this type of table one can gain a better understanding 
of the item selection process using specified items. For example it can be 
seen that the number of items selected increases gradually from 10 in the 
first position to 48 in the twentieth position. 

The minimum and maximum theta values for which each of the 48 chosen 
items were selected and the number of times each item was selected are shown 
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in Table 6. The average theta range for each of the 48 items is 1.65. The 
number of times each item was "administered 11 ranged from 6 (item 38) to 798 
(item 23). 

The average theta range for items selected in the first position was 
.730. The size of the average theta range for the twenty positions varies 
from .518 for items selected in the second and nineteenth positions to .933 
for items selected in the seventh position. 



Insert Tables 5 and 6 about here 



No significant residuals (p < .05, NRES > 1.96) were found at any of the 
theta values for any of the selected items. The average of the absolute value 
of the normalized residuals | RES | for each item are reported in the last 
column of Table 2. This average ranges from .308 to .719 with the majority 
lying between .31 and .35. 

Discussion 

The "lack of sensitivity" of the normalizing process co detect 
significant residuals is partly due to the average n per theta. By 
rearranging formula (1), it can be shown that the Cox and Snel 1 (1968) 
transformation is dependent on sample size. That is, the larger the sample 
size per theta, the smaller the confidence band becomes around the estimated 
ICC. This is illustrated graphically in Figure 3. In Figure 3 the 95% 
confidence band around on the ICC for item 1 (a = .89, b = -.99, c = .16) for 
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n = 10, 50 and 100 are plotted. These confidence bands were calculated by 
rearranging (1) and solving for the observed p, y : /n : , when n and RES; are 

•Li. A. 

specified. The formula to compute the upper 952 confidence limit when n = 10 
is given as, 

11/6 1/6 
1.96 P. 1 (1-P.) 1 
*" L ^— [q - ♦ *{P { - 1/6(1 - 2P./10)}} (2) 



Insert Figure 1 about here 



It can be seen that for the 15 residuals within the targeted ability 
range for item 1, none was beyond the 95Z confidence band until n = 50. This 
is considerably above the average of 8 subjects per theta used in this 
study. If the sample size was at least this large t^o thirds of these 
residuals would have been significant at p < .05 . (However, it is 
questionable that such residuals would exist v;ith such a large sample size.) 

The correlation between the x 2 goodness of fit value and j RES j was 
-.177. This lack of linear relationship is probably due, in part, to the 
sensitivity of the normalized residual analysis, which appears to detect a 
great deal of "noise" common to any regression analysis. However, the 
advantages thought to be gained by this technique may not be that helpful 
unless a larger calibration sample is used (i.e. large enough to yield an n 
per theta > 50). 
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Experiment 2 

Since item selection within the adaptive testing process is directly 
related to the amount of information each item provides, it was decided in the 
second part of this study to investigate how much information would be lost if 
the estimated parameters were changed to fit the o l served calibration data 
exactly within the targeted theta range. 

Method 

Subjects and Materials 

The simulated examinee and adaptive test results used in Experiment 1 
were also used in Experiment 2. That is, the 1000 generated subjects, the 
results of their simulated adaptive tests, and the calibration data were 
reanalyzed in the second part of this study. 

Design and Procedure 

To investigate the amount of mis-information which occurs from the lack 
of fit. the following statistic was derived 

k 

MT <5 = 1 = 1 J L 
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where MISj = Mis-information statistic for item j 

I. = Item information vrlue for item i usine the orieinallv 
b . w 
J 

calibrated difficulty parameter "bj" 

I = Item information value using the adjusted b. value for theta "i". 
b. 1 
1 

k = # of thetas (n > 5) in the ability range in which item j was 
selected 

The following steps were performed to calculate the MISj statistic for 
each item: 

Step 1: For ea"h of the thetas (n > 5) in the ability range for which an 
item was selected in the CAT simulation, the observed proportion correct in 
the calibration sample was computed. 

Step 2: Using the observed proportion correct, a new difficulty b^ was 
calculated, for each 6^. This b^ is what the difficulty would have to be if 
the observed p for the estimated ICC were to become the expected p. That 
is, b^ is selected so that the new ICC would pass through the observed p value 
at the given 8,. An assumption is made that the a and c parameters would 
remain as originally estimated. If the observed proportion correct < c, then 
the largest displaced b. is used since obviously no new ICC could be 
created. This process is represented graphically by the dotted lines in 
Figure 2 which denote the new ICCs passing through the residuals > c for 
item 1. 
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Insert Figure 2 about here 



Step 3: For each theta in the targeted range the information function 

using b. and b. was determined. The difference between these values at their 
J i 

respective theta levels were then calculated. (Note: the difference between 
the information functions can be either positive or negative.) 



If 9. - b. < 9. - b. then I —I < 0. 
i j i i b . b. 



If 9. - b. < 9. - b. then I - T > 0.) 
i j i i b. b« 



This is because the information function will be a maximum near the 
point 9 = b. (See Lord 1980, p. 152 for the exact where the maximum ot a 
3PL model occurs.) 

A graphical representation of this analysis can be suen in Figure 3. The 

original information function is represented by the dark thick curve , whi le 

the two adjusted ones are represented by chain dotted curves. In one 

instance (b. = -.45, b. = -.99, 9. = -1.05) the difference can be seen to be 
1 J 1 

positive (dashed line) and in the other (Ik = -1.46, b^ = -.99, 9^ = 1.41) 
negative (dotted line). 



Insert Figure 3 about here 



1 i 
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Step 4: The mean average of the absolute value of the 



dlffctcuCc, I 





1 



range . 



To evaluate this statistic the following ratio was formed. 



I 



MIS. 
1 



AMIR. = 
J 



J ~ 



I 



s , 
J 



where AMIR* = average mis-information ratio for item j 



I = the average item information value provided by item j in the 
J 

simulated CAT 



The AMIR ratio is formed in the following manner. The item information value 
for each estimated theta was calculated for each item and averaged over the 
number of times the item was selected in the simulated CAT. The mis- 
information value calculated using the calibration sample was then subtracted 
from the average information provided when the item was selected in the 
simulated CAT. The difference vas then divided by the average item 
information value provided in the simulated CAT. Notice that the AMIRj ratio 
will only be negative if an item provides more average mis-information than 
average information in the theta range in which the item is selected. Thus it 
is believed that if the AMIRj < 0 the lack of fit of the 3 PL model to the item 
response data provides sufficient mis-information that an item should probably 
not be included in a CAT item pool. 
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Results and Discussion 



The MISj values computed using the ^oility range provided by the 
simulated adaptive tests and the 3PL parameter estimates and item response 
from the calibration samples are shown in Table 7. 



Insert Table 7 about here 



The range of the MISj values for the 48 selected items is .12 to .56 with an 
average of .28. The AMIR ratio for each selected item is also shown in 
Table 7. The average of the AMIR ratios was .53 with a standard deviation of 
.36. 

Only one item, item 45, has a AMIRj ratio less than one. However, this 
item was selected in a range from -4.05 to -1.43, and only four thetas (n > 5) 
could be found in this range in the ca i: bration sample. Item 45 was the 
easiest item (b = -1.552) out of the 100 items in the pool, and yet it was 
providing more "mis-information 11 than information in the selected range. Thus 
in this case the negative AMIR value could be interpreted as an index 
describing the item pool, suggesting a need for more easy items. An important 
aspect which needs to be considered is accurately specifying the targeted test 
population for the CAT simulation. That is, if it was expected that only high 
abi :ty students would be taking the CAT then there would probably be no 
concern for adding more low difficulty items. 
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General Discussion 

The best way to understand if the items collected for a CAT item pool 
provide effective measurement in the targeted ability range is to simulate the 
adaptive testing process. Simulation enables one to determine the range over 
which an item is most likely to be chosen, and thus provide better 
interpretation of x 2 goodness of fit analyses, 

The method of selection used in the simulated adaptive test for this 
study was to select the item which provided the most information at the 
current estimated ability level, however, other methods do exist (see Hulin, 
Drasgow and Parsons 1983, pp. 226-230.) 

Hulin, Drasgow and Parsons (1983) reported similar findings about the 
number of items selected in the CAT process. Their results, using the maximum 
information method, revealed that only 119 items out of a 260 item pool were 
ever selected. These findings when viewed in concert with the results of this 
study, would suggest that if the criteria for item selection in the CAT 
process is to select T.he item which provides the maximum information, over 
half of the items may never be used. 

If one chooses this method of item selection two concerns arise. First, 
the items which are selected (e.g., in a simulation) need to be checked for 
the degree of mis-information each provides due to lack of logistic fit. 
Second, how shall those items not selected be evaluated? One possible 
solution would be to avoid this problem by restricting (e.g., for security 
reasons) the number of times an item can be administered. For example, in the 
present study of the 48 items chosen in the CAT simulation each was selected 
on an average of 355 times! The minimum number of times each item could be 
selected so that all 100 items were used equally would be 10 times. However, 
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the standard error of the ability estimates would be greatly disproportionate 
between the first examinees and the last examinees due to lack of informative 
items remaining in the pool* Thus, if a selection restriction is placed upon 
the items in the pool, it is necessary to have a large enough item pool to 
provide accurate ability estimates for the entire test population over the 
targeted range. 

The normalized residual transformation although more directly 
interpretable than a x 2 goodness of fit test appears to be an infeasible 
approach because of the large sample sizes needed. Such large samples might 
make the cost of the calibration process prohibitive. 

The results obtained using the AMIRj ratio suggest that the lack of fit 
in the logistic regression process should not be a major concern in the 
selection process for a CAT item pool. Of the 48 items selected in the CAT 
simulation, 10 would have been rejected outright if the selection criteria was 
a x 2 goodness of fit value whose p < .05. Using the AMIR ratio only one item 
was flagged, and this in part, was due to the lack of very easy items in the 
pool, rather than a faulty item. These results are promising when one 
realizes the time, effort and expense put into item development. 

The correlation between the AMIR ratio and the x 2 goodness of fit value 
for the 48 items was only .131, suggesting little linear relationship between 
the two. Normalized residuals, however, do correlate quite highly with the 
AMIR ratios, r = -.776. 

More research needs to be conducted to validate the concerns and new 
approaches presented in this paper. Mis-information analyses needs to be 
conducted using other methods of item selection. Hopefully the problems which 
plague goodness of fit analyses may prove to be unwarranted. 
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Table 1. Bock's goodness of Fit Analysis for Item 14 



Logist Parameter Estimates: a = 0.790 b = 0.604 c = 0.200 



Observed Expected Cell Min Max Median 

Cell P P N x 2 Theta Theta Theta 



1 


.293 


.233 


263. 


5.187 


-2.99 


-1.27 


-1.73 


2 


.305 


.281 


262. 


0.756 


-1.27 


-0.83 


-1.P2 


3 


.322 


.325 


264. 


0.013 


-1.83 


-0.48 


-0.65 


4 


.398 


.376 


264. 


0.544 


-0.48 


-0.23 


-0.34 


5 


.428 


.424 


264. 


0.019 


-0.23 


0.02 


-0.10 


6 


.477 


.479 


264. 


0.004 


0.02 


0.27 


0.14 


7 


.449 


.548 


263. 


10.524 


0.28 


0.57 


0.41 


8 


.643 


.623 


263. 


0.425 


0.57 


0.83 


0.69 


9 


.725 


.704 


262. 


0.567 


0.83 


1.21 


1.00 


10 


.867 


.832 


264. 


2.380 


1.22 


2.96 


1.59 



Note: Bock's Chi squared goodness of fit total is 20.419 with 7.0 degrees of 
freedom P = 0.005 
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Table 2. Bock's goodness of Fit Analysis for Item 5 



















Logist Parameter Estimates: 


a = 1.020 


b = 0. 


511 c = 


0.160 






Observed 


Expected 


Cell 




Min 


Max 


Median 


Cell 


P 


P 


N 


X 2 


Theta 


Theta 


Theta 


1 


.189 


.250 


264. 


5.239 


-2.99 


-1.27 


-1.73 


2 


.447 


.406 


264. 


1.865 


-1.27 


-0.83 


-1.02 


i 
j 




con 
. J JU 


263. 


0.342 


-0.83 


-0.48 


-0.65 


4 


.654 


.642 


263. 


0. 171 


-0.48 


-0.23 


-0.34 


5 


.712 


.724 




n l n 


-U . c j 




n in 
-0 . 10 


6 


.795 


.795 


264. 


0.001 


0.02 


0.27 


0.14 


7 


.856 


.859 


263. 


0.020 


0.28 


0.57 


0.41 


8 


.890 


.907 


264. 


0.885 


0.57 


0.83 


0.69 


9 


.955 


.943 


265. 


0.672 


0.83 


1.21 


1.00 


10 


.974 


.979 


266. 


0.308 


1.22 


2.96 


1.59 



Note: Bock's Chi squared goodness of fit total is 9.678 with 7.0 degrees of 
freedom P = 0.208 
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Table 3. Descriptive Statistics of the CAT Sim ulation 



Examinees were generated randomly from a N (0,1) distributi 



ion 



N = 1000 
Mean 9 = .01 
S.D. 9 = .97 



Minimum = -4.05 
Maximum = 3.80 



Estimated abilities 



Mean 9 = -,02 
S.D. e = 1.35 



Minimum = -4.05 S.E. = .412 
Maximum = 3.87 



Length of simulated tests 



Mean = 17.00 
S.D. = 6.25 



Minimum = 1 
Maximum = 20 



ERLC 



9<> 



22 





4. *: 




ceters 


arid oc:uH 


ESS C* 


tit = tat : 


sties 


tie ite 


is seiecte^ in 


the CAT 


=1 rulat 


ion 




3 


D 


C 


7. 


P 


ore 


i i En 


/•* 


b 


a* 
t 


K 


P 




< 
1 


,\ CO 
Vi 0 7 


7^ 


y. Jo 


O. J< 




C i 

, jO 


43 


I . U ) 




-0,7ft 


— 
0. IS 


33. 15 




. ">3 


n 


y , 


1 ( iJ 


I'. 16 


1 _■. -4 


, 0 j 




4* 


U. 7 J 


-y.53 


0.13 


13. 03 






r- 

J 


i ■ Va 


-ft S1 


ft 1 A 

v i O 


9 . oo 


« 4. i 


• J*. 


= 7 


1 hA 


-0.33 


0.18 


ly . ho 


. 16 


,o3 


0 
0 


i • y*f 


ft n7 


ft 1 1 


n P os 

4.7 ■ 7 J 


. yy 


. JJ 




1 1 A 


0.13 


0.18 


c 7 1 
j. /o 


c;7 

, j/ 


, j *. 


7 


\ IS 


V • 1 • , 


i) 'A 
U. iO 


7 fH 
/ ■ O J 


. o j 


■ 2 7 


SA 
JO 


1 n 7 


-0.41 


O.iS 


O. 10 


c n 
. Ji. 


7 j 


1 " 

i L. 


J, On 


-ft AS 


ft 1 A 

y. id 


W . 4.0 


ft* 1 


T C 


u i 


1 ft A 
1 . I/O 


0.66 


0.18 


17 i.0 

1 / . 07 


. L'l 


#■ t ♦ 




1 fit 


ft At 


ft i Q 
y. i" 


A 


ilA 


"TC 
1 2 < 


r O 

jT 


1 d 1 


-0.40 


0.18 


7 7 1 
/ . / 1 


. .'6 


, j^. 


i. J 


1 i *rO 


V. 1 / 


ft 1 " 


7 iu 




'O 


cy 


1 . L'j 


-0- 1*7 


0.18 


1 " A T 

1*.. H ^ 


"iO 






1 • 03 


f OA 

v t DO 


J* i7 


0 . 7 f 


. / b 


. .>/ 


o3 


; 7 n 
1 . / 4. 


0.68 


0.24 


4.0. v'O 


. C'O 




jlO 


1 ft 7 


ft 

y ■ t j 


\ i i J 


f . 00 


. 00 


"»7 


0/ 


1.13 


ft. 72 


0.13 


4 , J4l 


7T 

. /a 


. 3j 


t£3 


i CI 

i » J / 




y. i_4 


IT- * £ 

i V ■ t *T 


. i 4 


• J . 


X 0 

oO 


1 . Vi. 


1.07 


0.12 


L 70 

0, '0 


-,1 
, «. / 


c 7 
. J/ 


>- n 

Li 


A 07 
V . 7 / 


A 1 7 


n AO 
U. V7 


c. ub 


- n 

. 4.0 




70 


2. C'l 


1-23 


0.24 


11.08 


. 14 


.36 


i ' 1 


1.6m 


0.55 


0. 2*' 


4.59 


. '1 


"» T 

. J. 1 


■77 


1.10 


1.16 


0,20 


10.Se 


.16 


.46 




1.49 


0.60 


0.09 


10. u4 


.19 




74 


1.32 


0.62 


0.12 


3.32 


.55 


.36 




1.01 


0.<?8 


0.25 


5.92 


. jj 


.34 


75 


1.42 


1.41 


0.26 


2,93 


.89 


J c 

. f J 


35 


1.16 


0.S9 


0.16 




,67 


.43 


76 


1.74 


1.02 


0.25 


6.97 


.43 


.39 


3o 


1.59 


0.72 


0. 16 


2.14 


.95 


,38 


77 


1.26 


1.74 


y, 15 


13.67 


.06 


.55 




1.75 


ft. 95 


0.14 


11.99 


.10 




78 


1.10 


0.69 


0.01 


16.27 


.02 


.36 




0.93 


ft. 60 


0.16 


8,52 


. LI 


*** 


84 


0, ^3 


-0.57 


0.16 


17.34 


.02 


.34 


39 


1.75 


0.76 


0. 20 


6.21 




.39 


35 


1.Z9 


-0.88 


0.16 


16.13 


. 02 


.34 




0.93 


1.17 


0.09 


5.06 


.65 


.4; 


96 


0,95 


-1.14 


0.16 


6. 19 


cn 
■ J4. 


";7 


»i 


0.96 


-0.72 


O.iS 


13.69 


,06 


.31 


87 


0.95 


-0.65 


0.16 


3.29 


.31 


.32 


a 


1.0m 


-1.33 


0.1S 


9.25 


,n 


. 3j 


90 


0.51 


-i.u2 


0.16 


14.20 


.05 


.34 


45 


0.74 


-1.55 


0.18 


5.01 


.66 


.72 


51 


1.08 


-0.64 


0.16 


11.77 


.11 


.34 



*** Note: ite^s for whicn no residuals (n 4 5.' were found in the calibration sample 
in tne aoiiitv ranqe in which trie ite* w^s selecteo 



ERIC 



23 

faljle 5. fcsirjci I tea Selection 



ITEM 


1 






4 


cr 












^ i 




i ~ 


14 


15 


lb 




# u 






1 


A 


o 




j 


t \ 








i 




1 










T 




c 






n 


,- 


14 




24 






14 






r 


1 4 




Q 
Q 




i V 


1 7 




1 7 
1 J 




i -* 


5 


0 


A 


o 


0 


126 


48 


127 
1 x / 


2 y 


~y 

A. 7 


0 




i. 7 




- r 
J 


p 




y 




b 


! ] 
i 1 




o 


q 




(j 


j} 


"A 


*. 0 


4 7 


JO 


/ fc 


t r 
l J 


i i. i 


/ 7 


Q J 


U4 


l 0 


| 7 


i 0 




n L 

*_b 


a 

7 




fl 

V 


A 
V 


o 


u 


U 


A 




A 


1 
i 


1 1 
l i 


i ^ 






g7 


' 40 


1 ! A 

1 i T 


u J 


7 / 




12 


o 


o 


o 


o 




0 


0 


r. 


0 




:) 


(t 




l) 


1 


11 


5 




c 


10 


21 


o 


i; 


o 


y 


m 


(i 




(j 


tj 


o 




ij 

V 


y 


f) 


A 


A 


A 

V 


b 


j 7 


JT 




43 


26 


9 


284 


29 


71 


56 


71 


13 


54 


36 


19 


17 


1 J 




15 


12 


\ 


17 


X 
0 


25 


72 


59 


99 


33 


27 


26 


40 


31 


33 


12 


25 


IB 


20 




7 


0 


j 


1 
t 


X 


7 
j 


26 




0 


0 


0 


\j 


0 


0 


0 


0 


0 


0 


0 


0 


o 


0 


29 


48 


76 


77 


1 17 


23 


o 


o 


o 


A 


o 


o 


; < 


"» r 


"i 

-•u 


29 


A7 


17 


A 4 


jj 




Al 


?R 

xu 


4 1 

T i 


"» r 

J J 


X 4 * 


29 


0 


fi 


1 ) 


\j 


o 


1) 


Ij 




1 J 


A 


j) 


j 


t4 


7 y 

J 7 


I 7, 


. X 


D J 




b ! J 


A7 


3fi 


0 


0 


120 


SI 


1ft 9 


114 


92 


76 




20 


18 


< 7 


y 


/) 


4 


4 






A 


4 


,n 


77 


m 
/ 1 


l q 2 


HO 


73 


i ~* t 


i T 


c 


1 1 




13 




] 








r 




i 

X 


ir 


7" 




I) 


1 1 


If 






• ; 


V 




*tb 


c 


jO 






x b 


i j 


xU 






q 


"7 E 




\j 


! ! 


fl 


j 




(_( 










1 1 


J 


i i 


** £} 


0 D 


bx 


X 


1 4 


7 


3d 




r, 


5 


4-5 


151 


197 


48 


43 






1 l) 








1 




A 




4 


A 






!j 


ij 


i) 






r.i 


!; 




- ( 


1 J 




7 


V 


fi 


fi 




jL 
0 


?A 

X'j 


i 
0 


35 




1) 


0 


0 


0 


0 


0 




u 


; ; 


f) 


A 


A 


I ; 


! , ! 




o 


j) 


( j 






i i- 




t7 


40 


58 




4 4 


~ y 






i. 


A 
*t 


0 
0 


0 






X 


X 


i 

T 




*»«, 


u 


n 


13 






1 




o 


^.6 


r 


{j 




^7 


r c 

J J 




1 
i 


13 






1 ] 




\\ 


j) 




ij 


[j 






1 

1 




"7 
X .' 


j 7 


O 
7 


1 J 


1 7 


& 


« i 




\ 7 


x'y 


1 b 


44 


36 


j 15 


27 


75 


9 




10 




20 


19 


u 


A 

u 




i V 


i s 

1 J 




c 

7 


& 

0 


i i 


b 


45 




7 






5 


y 


ij 


a 

o 


I 
i 


\ 
i 


1 
1 




1 


1 


4 


o 


1 
i 


i 
l 


j 
i 




*B 


o 


0 


ft 


o 


o 


ij 


13 


4"\ 


60 


16 




14 


1 1 


21 


i i 


• i 


1 h 

l 0 


1 7 


1 3 


i j 






{} 


o 


o 


i) 


[j 


{j 


(j 


1 ; 


j) 


A 

V 


A 


1 1 

V 


q 


1 X 


c 

J 




O 


i \ 
i i 


i 7 


r -» 


o 


0 


o 


o 


o 


() 


A, 


A 

V 


1 1 

V 






S" 7 


C 7 


XT 


4. 


I u 


Ifj 


1 7 


xb 


T i 


54 


u 


fj 


J 


o 


o 


\'j 




j) 


A 


A 


fi 


V 


r 
J 


P, 

0 


J I 


40 


i Ii7 


74 


Do 


JU 


56 


o 


o 


17b 


28 


172 


14 


c £ 


c 


19 


24 




d 

T 


i i 


9 


C 


y 

7 




7 


c 


7 

i 


57 


IJ 


o 




o 


ij 


i) 


o 


A 


f) 




A 


A 

V 


A 

V 


Q 






fi 


( } 


L 
O 


i i 


59 


287 


259 




35 


13 


1 


7 




10 




1 V 


1 4 


1 0 

i V 


7 


l 




D 

D 


r 
J 


1 


X 


60 


0 


0 


o 


0 


0 


(j 


(j 


ft 


u 






fi 


I) 


i) 


1 


h 


7 


? 1 
i. 1 


70 

J 0 


A4 


63 


o 


o 


o 


o 


0 


37 


163 


137 


54 


66 


4R 


?1 

i. i 


a 

•J 




j 


1 0 


L 
□ 


7 


0 


T 


67 




o 


IJ 


o 


o 


o 






i) 


i) 


!) 




67 


79 


100 


56 


i. u 




11 


1 7 
i. / 


cS 


[) 


o 


o 


o 


o 


o 


(; 


o 




0 


() 


j) 


ij 








o 


-« n 


J 7 


?1 

X i 


70 


"M 


112 


38 


10 


10 






10 


9 


? 


10 


11 


9 




9 


14 






4 


c 

J 


73 


G 


0 


0 


o 


o 


o 


o 


() 


o 


(j 


fl 


o 


10 


o 


1 


7 


24 


23 


g 


1 1 


74 


0 


0 


j 


o 


o 






42 


277 




61 


53 


0 






A 


7 




/ 


1 

j 


75 






11 


13 


3 


o 


o 


34 


19 

i 7 


t 


1 L 
l u 


o 

7 




A 

0 


7 

/ 




c 

D 


b 


9 


L 
D 


76 


o 


o 


o 


ij 


44 




J 


A? 


4A 


73 
,'0 


L. $ 


j 7 




^7 


?A 

XV 


1 A 

I 0 


A 

0 


7 


0 


7 


77 


c 

J 


57 


51 


11 


7 


1 


0 


0 


0 


16 


0 


2 


1 


i. 


1 


r 

J 


14 


B 


4 


4 


78 


o 


0 


0 


9 


0 


u 


f; 


0 


s 


120 


168 


139 


115 


93 


24 


15 


10 


7 


11 


11 


84 


0 


0 


0 


o 


0 


0 


o 


0 


0 


0 


0 


11 


\ - 


6 




20 


c 


15 


16 


L. t. 


85 


X i / 


118 


24 


11 


0 


123 


ft 


15 


T 

X 




14 


4 


12 


7 


14 


r 

J 


4 


6 


8 


C 
J 


86 


fj 


0 


i; 


7 


4 


0 


13 


4 


24 


17 


11 


10 


12 


12 


13 


7 


20 


12 


c 

J 


1 

i i. 


87 


1) 


0 


0 


0 


0 


u 


0 


0 


0 


I 


18 


10 


10 


16 


18 


28 


13 


13 


24 


17 




0 


ft 


0 


0 


o 


0 


ft 


o 


0 




0 


1 


0 


1 


4 


3 


8 


3 


4 


7 


?1 


fl 


0 


0 


0 


o 


78 


73 


5) 


21 


13 


9 


21 


s 


li 


15 




i: 


19 


21 


1! 



0 

ERJC 



24 



24 



Table 6. Minimum and Maximum Values for which jach Selected Item 

****** PU A n «n 



[tern 


N* 


Min 

e 


Max 
9 


i 


\ob) 


-1 .9 J 


-1 .05 


2 




-2 .jo 


-0.20 


c 

J 




-1.21 


0.5 2 


Q 

o 


t a n \ 


-1.04 


0 .94 


Q 


v joo; 


n c 7. 


1 1 o 

1 • lo 


1 o 

12 




-1.42 


-n .82 


2 1 


uy ) 


n nn 


1.23 


2 J 


v /9o; 


-0 • o J 


1.77 


2 J 




U • 2 1 


1.73 


ZD 


\ 349; 


-0 • lo 


1 .78 


28 




-1.20 


0.99 


29 


(395; 


-1. 19 


1.09 






-0.21 


1 .94 


J2 


w 14; 


-0.2/ 


1.77 


33 


(392) 


0.46 


2.02 


35 


(248) 


0.70 


2.20 


36 


(596) 


0.03 


1.86 


37 


(32) 


1.68 


2.25 


38 


(6) 


2.31 


2.31 


39 


(651) 


-0.18 


1.86 


40 


(259) 


0.78 


3.87 


41 


(160) 


-1.64 


-0.29 


44 


(397) 


-2.43 


-0.27 


45 


(111) 


-4.05 


-1.43 



Item 




Min 
9 


Max 
9 


48 


(299) 


-1.67 


-0.03 


49 


(70) 


-1.23 


-0.62 


53 


(332) 


-1.27 


0.18 


54 


(418) 


-0.59 


0.89 


56 


(605) 


-1.05 


0.63 


57 


(18) 


1.81 


2.29 


59 


(691) 


-1.23 


0.80 


60 


(131) 


-0.79 


0.08 


63 


(581) 


-0.01 


2.04 


67 


(420) 


0.19 


2.23 


68 


(160) 


1.18 


2.15 


70 


(358) 


0.66 


2.41 


/3 


(84) 


1.25 


2.09 


74 


(659) 


-0.33 


2.14 


75 


(173) 


0.99 


3.33 


76 


(425) 


0.42 


1.82 


77 


(185) 


1.10 


3.80 


78 


(771) 


-0.97 


2.05 


84 


(113) 


-1.35 


-0.57 


85 


(592) 


-1.66 


0.42 


86 


(183) 


-2.07 


-0.55 


87 


(173) 


-1.56 


-0.21 


90 


(27) 


-1.62 


-0.99 


91 


(392) 


-1.32 


0.38 



N = the number of times the item was selected in the CAT simulation. 
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Table 7. Mi s-Inf ormation and Average Mis-Information Ra tio Values 
for the Item Selected in the Simulated CAT. 



Item MIS AMIR Item MIS AMIR 
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icitit 

Denotes items for which MIS and AMIR could not be calculated because there 
were no thetas in the selected range in the calibration samples. 
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95% Confidence Intervals Placed About 
the ICC for Item 1 Using Three 
Different Sample Sizes 
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Figure 2. Adjusted ICC's Passing Through the 

Residuals in the Selected Range for 
Item 1 
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Figure 3. Mis-Information Analysis for Item 



